Take Home Exercise 3

Author

Georgia Ng

Published

October 17, 2024

Modified

October 23, 2024

1. Overview

1.1 Introduction

1.2 My Responsibilities

  • Data Preparation, Preprocessing

1.3 Importing Packages

Here, we have loaded the following packages:

pacman::p_load(sf, sfdep, tmap, tidyverse, RColorBrewer, ggplot2, spatstat, jsonlite, units, matrixStats, httr)

2. The Data

For this project, we will be using the following data sets.

  • Singapore Rental Flat Prices (Jan-17 to Sep-24) from data.gov.sg

  • Master Plan 2014 Subzone Boundary (Web) from data.gov.sg

  • Hawker Centres Dataset from data.gov.sg

  • Kindergarten, Childcare Datasets from OneMap API

  • Bus Stops Location, MRT/ LRT Locations from LTA Data Mall

  • Shopping Mall Coordinates through wikipedia and webscraping with the coordinates retrieved through OneMap API

2.1 Importing Geospatial Data

2.1.1 Importing Singapore Subzone Boundaries

The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sfpackages.

mpsz_sf <- st_read(dsn = "data/geospatial/MasterPlan2014SubzoneBoundaryWebSHP/", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/MasterPlan2014SubzoneBoundaryWebSHP' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
write_rds(mpsz_sf, 'data/rds/mpsz_sf.rds')
st_crs(mpsz_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

2.1.1.1 Checking Validity of Geometries

Using st_is_valid, we can check to see whether all the polygons are valid or not. From the results, we can see a total of 9 not valid.

# checks for the number of geometries that are invalid
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 9

To rectify this, we can use st_make_valid() to correct these invalid geometries as demonstrated in the code chunk below.

mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))
[1] 0

Here is a plot of Singapore.

plot(mpsz_sf)

2.1.2 Importing Kindergartens

Click to view the code
kindergarten_json <- fromJSON("data/geospatial/kindergartens.json")

kindergarten_cleaned <- kindergarten_json$SrchResults[-1, ]

kindergarten_df <- data.frame(
  NAME = kindergarten_cleaned$NAME,
  latitude = sapply(kindergarten_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[1])),
  longitude = sapply(kindergarten_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[2]))
)

kindergarten_sf <- kindergarten_df %>%
  st_as_sf(coords = c("longitude", "latitude"), crs=4326) %>%
  st_transform(crs = 3414)
head(kindergarten_sf)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 19307.62 ymin: 38859.25 xmax: 30705.05 ymax: 46300.26
Projected CRS: SVY21 / Singapore TM
                                                        NAME
1 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 435 (KN)
2 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 533 (KN)
3 PCF Sparkletots Preschool @ Cheng San-Seletar Blk 556 (DS)
4        PCF Sparkletots Preschool @ Chong Pang Blk 107 (KN)
5        PCF Sparkletots Preschool @ Chong Pang Blk 122 (KN)
6      PCF Sparkletots Preschool @ Chua Chu Kang Blk 10 (KN)
                   geometry
1 POINT (30325.45 38859.25)
2 POINT (30190.51 39574.18)
3  POINT (30705.05 39337.9)
4 POINT (27354.73 45992.92)
5 POINT (27755.87 46300.26)
6 POINT (19307.62 40271.08)

2.1.3 Importing Childcare

Click to view the code
childcare_json <- fromJSON("data/geospatial/childcare.json")

childcare_cleaned <- childcare_json$SrchResults[-1, ]

childcare_df <- data.frame(
  NAME = childcare_cleaned$NAME,
  latitude = sapply(childcare_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[1])),
  longitude = sapply(childcare_cleaned$LatLng, function(x) as.numeric(unlist(strsplit(x, ","))[2]))
)

childcare_sf <- childcare_df %>%
  st_as_sf(coords = c("longitude", "latitude"), crs=4326) %>%
  st_transform(crs = 3414)
head(childcare_sf)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 17828.84 ymin: 29221.89 xmax: 40985.94 ymax: 45530.47
Projected CRS: SVY21 / Singapore TM
                                            NAME                  geometry
1 APOLLO INTERNATIONAL PRESCHOOL PRIVATE LIMITED POINT (40985.94 33848.38)
2                   APPLE TREE PLAYHOUSE PTE LTD POINT (28308.65 45530.47)
3 Appleland Montessori Child Care Centre Pte Ltd POINT (17828.84 36607.36)
4                            APPLELAND PLAYHOUSE POINT (25579.73 29221.89)
5             APRICOT ACADEMY (LAGUNA) PTE. LTD. POINT (38981.02 32483.41)
6                                Arise Preschool    POINT (21588.47 36307)

2.1.4 Importing Hawker Centre

Similarly here, we will use st_read to read the geojson information, however since the columns values are in the format of `<th>

` etc we will need to use a regex which is what we have done below to extract the name of the hawker centres.

Click to view the code
hawker_sf <- st_read('data/geospatial/HawkerCentresGEOJSON.geojson')
Reading layer `HawkerCentresGEOJSON' from data source 
  `/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/HawkerCentresGEOJSON.geojson' 
  using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449017
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
Click to view the code
# Function to extract name from description
extract_name <- function(description) {
  if (!is.na(description)) {
    # Use regular expression to extract the NAME 
    name <- sub(".*<th>NAME</th> <td>(.*?)</td>.*", "\\1", description)
    if (name == description) {
      return(NA)  # Return NA if no match is found
    }
    return(name)
  } else {
    return(NA) 
  }
}

# Apply the extraction function to every row
hawker_sf <- hawker_sf %>%
  mutate(Name = sapply(Description, extract_name)) %>% select (-Description)
head(hawker_sf)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.7798 ymin: 1.284425 xmax: 103.9048 ymax: 1.449017
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
                             Name                      geometry
1     Market Street Hawker Centre POINT Z (103.8502 1.284425 0)
2    Marsiling Mall Hawker Centre POINT Z (103.7798 1.433539 0)
3    Margaret Drive Hawker Centre POINT Z (103.8047 1.297486 0)
4 Fernvale Hawker Centre & Market POINT Z (103.8771 1.391592 0)
5       One Punggol Hawker Centre  POINT Z (103.9048 1.40819 0)
6    Bukit Canberra Hawker Centre POINT Z (103.8225 1.449017 0)

As shown above, we can see that the geographic coordinate system for the hawker dataset is in WGS84 and has XYZ coordinates, among which contains the Z-coordinates we do not need. Thus, we can use st_zm() to remove the Z-coordinate and project it to the SVY21 coordiate system using st_transform().

hawker_sf <- st_zm(hawker_sf) %>%
  st_transform(crs = 3414)

head(hawker_sf)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22042.51 ymin: 29650.7 xmax: 35955.52 ymax: 47850.43
Projected CRS: SVY21 / Singapore TM
                             Name                  geometry
1     Market Street Hawker Centre  POINT (29874.82 29650.7)
2    Marsiling Mall Hawker Centre POINT (22042.51 46139.03)
3    Margaret Drive Hawker Centre  POINT (24816.7 31094.91)
4 Fernvale Hawker Centre & Market  POINT (32867.9 41500.77)
5       One Punggol Hawker Centre POINT (35955.52 43336.13)
6    Bukit Canberra Hawker Centre POINT (26794.39 47850.43)
Click to view the code
tmap_mode('plot')
tm_shape(mpsz_sf) +
  tm_polygons()+
  tm_shape(hawker_sf) +
  tm_dots(col='red')

2.1.5 Importing Bus Stops

busstop_sf <- st_read(dsn = "data/geospatial/BusStopLocation_Jul2024/", layer = "BusStop")%>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/BusStopLocation_Jul2024' 
  using driver `ESRI Shapefile'
Simple feature collection with 5166 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48285.52 ymax: 52983.82
Projected CRS: SVY21
head(busstop_sf)
Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 20134.2 ymin: 31253.63 xmax: 35565.66 ymax: 41659.52
Projected CRS: SVY21 / Singapore TM
  BUS_STOP_N BUS_ROOF_N            LOC_DESC                  geometry
1      65059        B12        ST ANNE'S CH POINT (35565.66 41659.52)
2      16171        B06     YUSOF ISHAK HSE POINT (21439.91 31253.63)
3      61101        NIL             BLK 120 POINT (31381.06 35313.49)
4      01239        B01        SULTAN PLAZA POINT (31152.55 31688.08)
5      17269        B01             BLK 730  POINT (20134.2 31917.38)
6      11291        B17 COLD STORAGE JELITA POINT (22723.57 33349.85)

2.1.6 Importing Shopping Malls

shoppingmall_sf <- read_csv('data/geospatial/shopping_mall_coordinates.csv') %>%
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_transform(crs = 3414)
head(shoppingmall_sf)
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28561.14 ymin: 28563.01 xmax: 31449.41 ymax: 34203.18
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 2
  `Mall Name`                               geometry
  <chr>                                  <POINT [m]>
1 100 AM                         (29129.86 28563.01)
2 313@Somerset                   (28561.14 31485.08)
3 Aperia                         (31449.41 32531.06)
4 Balestier Hill Shopping Centre (29029.77 34203.18)
5 Bugis Cube                     (30483.66 31167.35)
6 Bugis Junction                 (30458.71 31274.83)

2.1.7 Importing MRT

mrt_sf <- st_read(dsn = "data/geospatial/TrainStation_Jul2024/", layer = "RapidTransitSystemStation")
Reading layer `RapidTransitSystemStation' from data source 
  `/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex03/data/geospatial/TrainStation_Jul2024' 
  using driver `ESRI Shapefile'
Simple feature collection with 230 features and 5 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21

Having imported the dataset, we will now need to check for both invalid geometries and NA values before proceeding. The chunk of code detects not only these but also resolves it.

# Check for invalid geometries and NA values
validity_checks <- st_is_valid(mrt_sf, reason = TRUE)

# Identify indices with NA
na_indices <- which(is.na(validity_checks))

# Filter out rows with NA values from the mrt object
mrt_sf <- mrt_sf[-na_indices, ]

# Verify the mrt object no longer contains invalid geometries
any(is.na(sf::st_is_valid(mrt_sf)))
[1] FALSE

Here we use st_transform() to convert it to the SVY21 Coordinates System of CRS code 3414.

mrt_sf <- mrt_sf %>%
  st_transform(crs = 3414)
tmap_mode('plot')
tm_shape(mpsz_sf)+
  tm_polygons() +
  tm_shape(mrt_sf) +
  tm_dots(col='red')

2.1.8 Importing Primary School

This chunk of code imports the primary school dataset from data.gov.sg and uses the select() function to select the relevant columns through the input of the column numbers.

primarysch_df = read_csv('data/geospatial/Generalinformationofschools.csv') %>% filter(mainlevel_code =='PRIMARY') %>% select(1,3,4)

2.1.8.1 Geocoding Primary School Data using OneMap API

Click to view the code
geocode <- function(address, postal) {
  base_url <- "https://www.onemap.gov.sg/api/common/elastic/search"
  query <- list("searchVal" = address,
                "postal" = postal,
                "returnGeom" = "Y",
                "getAddrDetails" = "N",
                "pageNum" = "1")
  
  res <- GET(base_url, query = query)
  restext<-content(res, as="text")
  
  output <- fromJSON(restext)  %>% 
    as.data.frame %>%
    select(results.LATITUDE, results.LONGITUDE)

  return(output)
}
Click to view the code
primarysch_df$LATITUDE <- 0
primarysch_df$LONGITUDE <- 0

for (i in 1:nrow(primarysch_df)){
  temp_output <- geocode(primarysch_df[i, 2], primarysch_df[i, 3])
  print(i)
  
  primarysch_df$LATITUDE[i] <- temp_output$results.LATITUDE
  primarysch_df$LONGITUDE[i] <- temp_output$results.LONGITUDE
}
write_rds(primarysch_df, 'data/rds/geocoded_primarysch.rds')

As shown below, using head() we can see that the new columns for lat and long has been added with the values fetched using the OneMap API.

head(primarysch_df)
# A tibble: 6 × 3
  school_name                  address                      postal_code
  <chr>                        <chr>                              <dbl>
1 ADMIRALTY PRIMARY SCHOOL     11   WOODLANDS CIRCLE             738907
2 AHMAD IBRAHIM PRIMARY SCHOOL 10   YISHUN STREET 11             768643
3 AI TONG SCHOOL               100  Bright Hill Drive            579646
4 ALEXANDRA PRIMARY SCHOOL     2A   Prince Charles Crescent      159016
5 ANCHOR GREEN PRIMARY SCHOOL  31   Anchorvale Drive             544969
6 ANDERSON PRIMARY SCHOOL      19   ANG MO KIO AVE 9             569785

Using read_rds, we can access the already processed and geocoded data from rds without needing to run through the geocoding function again. Since the data is in the WGS coordinate system, we can use st_transform() to project it to the SVY21 coordinate system we will be using.

primarysch_df <- read_rds('data/rds/geocoded_primarysch.rds')
primarysch_sf <- primarysch_df %>%
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_transform(crs = 3414)
Click to view the code
tmap_mode('plot')
tm_shape(mpsz_sf)+
  tm_polygons() +
  tm_shape(primarysch_sf) +
  tm_dots(col='red')

2.1.9 Inferring CBD

Finally, let us factor in the proximity to the Central Business District - in the Downtown Core. For this, let us take the coordinates of Downtown Core to be the coordinates of the CBD:

lat <- 1.287953
lng <- 103.851784

cbd_sf <- data.frame(lat, lng) %>%
  st_as_sf(coords = c("lng", "lat"), crs=4326) %>%
  st_transform(crs=3414)

2.2 Importing Aspatial Data

2.2.1 Importing Rental Flat

The code chunk below is used to import the rental

rental_df = read_csv('data/aspatial/RentingOutofFlats2024CSV.csv')

To get a brief overview of existing columns of this dataset, we can use colnames() to do so.

colnames(rental_df)
[1] "rent_approval_date" "town"               "block"             
[4] "street_name"        "flat_type"          "monthly_rent"      

2.2.1.1 Converting rent_approval_date to a Valid Date Format

rental_df$rent_approval_date <- ym(rental_df$rent_approval_date)

2.2.1.2 Filtering For Selected Time Frame

rental_df <- rental_df %>%
  filter(year(rent_approval_date) == 2024)

2.2.1.3 Geocoding Rental Flat Data Using OneMap API

geocode <- function(block, streetname) {
  base_url <- "https://www.onemap.gov.sg/api/common/elastic/search"
  address <- paste(block, streetname, sep = " ")
  query <- list("searchVal" = address, 
                "returnGeom" = "Y",
                "getAddrDetails" = "N",
                "pageNum" = "1")
  
  res <- GET(base_url, query = query)
  restext<-content(res, as="text")
  
  output <- fromJSON(restext)  %>% 
    as.data.frame %>%
    select(results.LATITUDE, results.LONGITUDE)

  return(output)
}
rental_df$LATITUDE <- 0
rental_df$LONGITUDE <- 0

for (i in 1:nrow(rental_df)){
  temp_output <- geocode(rental_df[i, 3], rental_df[i, 4])
  print(i)
  
  rental_df$LATITUDE[i] <- temp_output$results.LATITUDE
  rental_df$LONGITUDE[i] <- temp_output$results.LONGITUDE
}
write_rds(rental_df, 'data/rds/geocoded_rental_2024.rds')
rental_df <- read_rds('data/rds/geocoded_rental_2024.rds')

2.2.1.4 CRS Adjustments

Another important step after importing the dataset is checking the coordinate system used, as seen in the result below using st_crs(), we can see that there is no CRS.

st_crs(rental_df)
Coordinate Reference System: NA

Therefore, we need to convert the longitude and latitude columns into a spatial format. Since our dataset is based in Singapore and it uses the SVY21 coordinate reference system (CRS Code: 3414), we will use the st_transform() function to perform the conversion and create the geometry column.

rental_sf <- rental_df %>%
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_transform(crs = 3414)

Using st_crs(), we can check and verify that the conversion is successful.

st_crs(rental_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
head(rental_sf)
Simple feature collection with 6 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 15786.61 ymin: 30769.77 xmax: 39668.39 ymax: 45634.94
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 7
  rent_approval_date town            block street_name    flat_type monthly_rent
  <date>             <chr>           <chr> <chr>          <chr>            <dbl>
1 2024-01-01         YISHUN          386   YISHUN RING RD 4-ROOM            3700
2 2024-01-01         JURONG WEST     140B  CORPORATION DR 4-ROOM            3900
3 2024-01-01         SENGKANG        471B  FERNVALE ST    5-ROOM            3700
4 2024-01-01         KALLANG/WHAMPOA 10    GLOUCESTER RD  3-ROOM            3600
5 2024-01-01         BEDOK           31    BEDOK STH AVE… 5-ROOM            4350
6 2024-01-01         QUEENSTOWN      82    STRATHMORE AVE 4-ROOM            3000
# ℹ 1 more variable: geometry <POINT [m]>
tmap_mode('plot')
tm_shape(mpsz_sf) +
  tm_polygons()+
  tm_shape(rental_sf) +
  tm_dots()

2.2.1.5 Checking for NA values

This chunk of code checks the dataset for any na values in all of the columns. As shown below, there is none.

rental_sf %>%
  summarise(across(everything(), ~ sum(is.na(.)))) -> extra_NA 
extra_NA
Simple feature collection with 1 feature and 6 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
# A tibble: 1 × 7
  rent_approval_date  town block street_name flat_type monthly_rent
               <int> <int> <int>       <int>     <int>        <int>
1                  0     0     0           0         0            0
# ℹ 1 more variable: geometry <MULTIPOINT [m]>

3. Data Wrangling

3.1 Removal of Redundant Columns

# Define columns to be removed
columns_to_remove <- c("block","street_name")

# Remove columns only if they exist in the dataframe
rental_sf <- rental_sf %>%
  dplyr::select(-all_of(columns_to_remove[columns_to_remove %in% names(rental_sf)]))

3.2 Filter By Flat Type

Let us get an overview of the distributions of the housing types. As shown in the histogram, we can see that there is significantly less data for flat types like 1-room, 2-room, and executive housing.

# Create a summary of counts for each remaining lease range
count_data <- rental_sf %>%
  group_by(flat_type) %>%
  summarise(count = n())

# Create the bar plot with labels
ggplot(count_data, aes(x = flat_type, y = count)) +
  geom_bar(stat = "identity", fill = "steelblue") + 
  geom_text(aes(label = count), vjust = -0.5, size = 4) +  # Add labels on top of the bars
  labs(title = "Count of Flat Type",
       x = "Flat Type",
       y = "Count") +
  theme_minimal()

Hence, we will focus on analyzing the 3-room, 4-room, and 5-room flats since they show a more substantial presence in the dataset compared to smaller flat types.

rental_sf <- rental_sf %>% filter (flat_type == '3-ROOM' | flat_type == '4-ROOM' |flat_type == '5-ROOM' )

3.3 Calculate Number of Amenities Within 1km & Proximity To Nearest Amenity

Click to view the code
calculate_amenities_and_proximity <- function(dataset1, dataset2, name_of_col_amenities, name_of_col_proximity, radius, calculateNumberOfAmenities) {
  # Calculate distance matrix
  dist_matrix <- st_distance(dataset1, dataset2) %>%
    drop_units()
  
  if (calculateNumberOfAmenities){
  # Calculate the number of amenities within the specified radius
    dataset1[[name_of_col_amenities]] <- rowSums(dist_matrix <= radius)
  }
  # Calculate the proximity to the nearest amenity
  dataset1[[name_of_col_proximity]] <- rowMins(dist_matrix)
  
  return(dataset1)
}
Click to view the code
rental_sf <- 
  calculate_amenities_and_proximity(
    rental_sf, kindergarten_sf, "no_of_kindergarten_500m", "prox_kindergarten", 500, TRUE
  ) %>%
  calculate_amenities_and_proximity(
    ., childcare_sf, "no_of_childcare_500m", "prox_childcare", 500, TRUE
  ) %>%
  calculate_amenities_and_proximity(
    ., hawker_sf, "no_of_hawker_500m", "prox_hawker", 500, TRUE
  ) %>%
  calculate_amenities_and_proximity(
    ., busstop_sf, "no_of_busstop_500m", "prox_busstop", 500, TRUE
  ) %>%
  calculate_amenities_and_proximity(
    ., shoppingmall_sf, "no_of_shoppingmall_1km", "prox_shoppingmall", 1000, TRUE
  ) %>% 
  calculate_amenities_and_proximity(
    ., mrt_sf, "x", "prox_mrt", 1000, FALSE
  ) %>%
  calculate_amenities_and_proximity(
    ., primarysch_sf, "x", "prox_prisch", 1000, FALSE
  ) %>%
  calculate_amenities_and_proximity(
    ., cbd_sf, "x", "prox_cbd", 1000, FALSE
  )


# Writing to RDS
write_rds(rental_sf,'data/rds/rental_sf.rds')
rental_sf <- read_rds('data/rds/rental_sf.rds')
head(rental_sf)
Simple feature collection with 6 features and 17 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 15786.61 ymin: 30769.77 xmax: 39668.39 ymax: 45634.94
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 18
  rent_approval_date town       flat_type monthly_rent            geometry
  <date>             <chr>      <chr>            <dbl>         <POINT [m]>
1 2024-01-01         YISHUN     4-ROOM            3700 (29463.95 45634.94)
2 2024-01-01         JURONG WE… 4-ROOM            3900 (15786.61 34389.18)
3 2024-01-01         SENGKANG   5-ROOM            3700  (33244.8 42043.31)
4 2024-01-01         KALLANG/W… 3-ROOM            3600 (30102.41 32911.75)
5 2024-01-01         BEDOK      5-ROOM            4350 (39668.39 33918.01)
6 2024-01-01         QUEENSTOWN 4-ROOM            3000 (25265.85 30769.77)
# ℹ 13 more variables: no_of_kindergarten_500m <dbl>, prox_kindergarten <dbl>,
#   no_of_childcare_500m <dbl>, prox_childcare <dbl>, no_of_hawker_500m <dbl>,
#   prox_hawker <dbl>, no_of_busstop_500m <dbl>, prox_busstop <dbl>,
#   no_of_shoppingmall_1km <dbl>, prox_shoppingmall <dbl>, prox_mrt <dbl>,
#   prox_prisch <dbl>, prox_cbd <dbl>

4. Overview Of Dataset

colnames(rental_sf)
 [1] "rent_approval_date"      "town"                   
 [3] "flat_type"               "monthly_rent"           
 [5] "geometry"                "no_of_kindergarten_500m"
 [7] "prox_kindergarten"       "no_of_childcare_500m"   
 [9] "prox_childcare"          "no_of_hawker_500m"      
[11] "prox_hawker"             "no_of_busstop_500m"     
[13] "prox_busstop"            "no_of_shoppingmall_1km" 
[15] "prox_shoppingmall"       "prox_mrt"               
[17] "prox_prisch"             "prox_cbd"               

Explanatory Variables:

Continuous

  • Distance to transport: distance_to_mrt_meters

  • Distance to amenities: distance_to_pri_school_meters

  • Distance to central business district: distance_to_cbd

  • Flat Type: flat_type

  • Prox_to:

Categorical

  • Remaining Lease: remaining_lease_range

  • Storey Height: storey_range

Dependent Variables:

  • Monthly Rental: monthly_rental, price_per_sqft

3. Shiny Storyboard (EDA)

3.1 Distribution

3.2

4. Distribution

4.1 Categorical Variables

4.1.1 Housing Type

# Create a summary of counts for each remaining lease range
count_data <- rental_sf %>%
  group_by(flat_type) %>%
  summarise(count = n())

# Create the bar plot with labels
ggplot(count_data, aes(x = flat_type, y = count)) +
  geom_bar(stat = "identity", fill = "steelblue") + 
  geom_text(aes(label = count), vjust = -0.5, size = 4) +  # Add labels on top of the bars
  labs(title = "Count of Flat Type",
       x = "Flat Type",
       y = "Count") +
  theme_minimal()

Bivariate Analysis

Correlation Matrix

Drafts

mpsz_sf_main <- st_union(mpsz_sf) %>%
    st_cast("POLYGON")
mpsz_sf_main <- mpsz_sf_main[c(12)]

mpsz_sf_owin <- as.owin(mpsz_sf_main)
plot(mpsz_sf_owin)

tmap_mode('plot')
tm_shape(mpsz_sf%>% filter(PLN_AREA_N == 'ANG MO KIO'))+
  tm_polygons()+
tm_shape(rental_sf %>% filter(planning_area_ura == 'ANG MO KIO'))+
  tm_dots()